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《心理 学 报 》 论 文 自 检 报告 


请 作者 填写 以 下 内 容 ， 粘贴 在 稿件 的 首页 。 


1. 请 以 “研究 亮点 ”的 形式 列 出 最 多 三 条 本 研究 的 创新 性 贡献 ， 总 共 不 超过 200 字 。 

《心理 学 报 》 的 目标 是 发 表 “ 既 科学 优秀 ， 又 具有 广泛 兴趣 和 意义 ”(be both scientifically excellent and of particularly broad interest 
and significance) 的 心理 学 前 沿 研究 。 如 果 您 的 研究 只 有 小 修 小 补 的 贡献 , 没有 尝试 开创 新 的 研究 领域 (new areas of inquiry) 或 提出 
独到 见解 和 创新 视角 (unique and innovative perspectives), 特别 纯粹 只 是 研究 没有 明确 心理 学 问题 的 算法 或 技术 的 工作 , 这 类 研究 
被 本 刊 接受 的 机 会 小 ,建议 另 投 他 刊 。 
答 : 证 明了 视觉 加 工 中 的 结构 分 析 系 统 的 对 应 ERP 成 分 


2. 作者 已 经 投稿 或 发 表 的 文章 中 是 否 采用 了 与 本 研究 相同 的 数据 ? 如 果 是 , 请 把 文章 附 上 
EAS (我 们 不 赞成 作者 用 同一 数据 发 表 多 篇 变量 相同 的 文章 ， 也 不 赞成 将 一 系列 的 相关 研究 拆 成 多 个 研究 来 发 表 的 做 法 。) 
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3. 管理 、 临 床 、 人 格 和 社会 等 领域 仅 有 自我 报告 (问卷 法 ) 的 非 实验 非 和 干预 研究 ， 需 要 检查 数 
据 是 否 存 在 共同 方法 偏差 (common method bias)。 为 控制 或 证 明 这 种 偏差 不 会 影响 研究 结论 
的 效 度 ， 你 使 用 了 什么 方法 ? 采取 了 哪些 措施 ? (共同 方法 偏差 的 有 关 文献 可 参见 
http://iournal.psych.ac.cn/xlkxjz/CN/abstract/abstract894.shtml) 基 于 横断 数据 , 仅 有 自我 报告 , 仅仅 在 方便 样本 中 施 测 ,这样 的 研究 
数据 易 取 得 , 但 通常 创新 性 价值 不 大 , 被 本 刊 接受 的 机 会 小 。 

答 : 不 是 问卷 


4. 是 否 报告 并 分 析 了 效果 量 (effeet sizes; 如 : ;检验 ,Cohen's d; 方差 分 析 : nn: 标准 化 回归 系数 )? (很 多 研究 只 
是 机 械 地 报告 了 效果 量 , 但 没有 做 必要 的 分 析 或 说 明 ,如 效果 量 是 大 中 小 ? 有 什么 理论 意义 或 应 用 意义 ? )。( 在 google 中 搜索 
“effect size calculator”, FT 搜 到 许多 计算 方便 的 APP 。 效 应 量 的 有 关 解 释 ， 中 文 可 参考 : 
http://journal.psych.ac.cn/xlkxjz/CN/abstract/abstract1150.shtml; 英文 可 参看 : http://www.uccs.edu/lbecker/effect-size.html 

是 否 报告 统计 分 析 的 95% CI? (如 , 差异 的 95% CI; 相关 /回归 系数 的 95% CD 置信 区 间 的 有 关 计 算 和 绘图 可 参考 
https://thenewstatistics.com/itns/esci/ ) 

H, 


= 
答 : 是 是 


5. 请 写 出 计划 的 样本 量 , 实际 的 样本 量 。 如 果 二 者 有 差别 , 请 写 出 理由 。 以 往 心理 学 研究 中 普遍 存 
在 样本 量 不 足 导致 的 低 统计 功效 (powen) 问 题 , 我 们 建议 在 论文 的 方法 部 分 解释 您 计算 及 认定 样本 量 的 依据 。 应 该 以 有 一 定 依据 
的 效果 量 (effect size). 期 望 的 功效 来 确定 样本 量 ， 并 报告 计算 用 软件 或 程序 。 样 本 量 计划 的 理由 和 做 法 可 参考 https://osfio/Sawp4/ 
答 : 计划 样本 量 20， 实 际 样本 量 11 (1 名 被 试 在 晚上 完成 脑 电 实验 ，3 名 被 试 未 理解 实验 要 
求 ，5 名 被 试 无 效 试 次 多 于 25%) 


6. 假设 检验 中 ,如 果 是 零 假 设 显著 性 检验 (NHST), 需 报告 精确 值 而 不 是 p 的 区 间 ( 小 于 
0.001 的 报告 区 间 ， 其 他 报告 精确 p 值 ). 你 的 论文 是 否 符合 该 项 要 求 ? 如 果 是 贝 叶 斯 因素 , 是 
否 已 报告 其 对 先 验 分 布 假定 的 敏感 性 ? 

R, 符合 


7. 为 保证 论文 中 数据 报告 的 完备 性 ， 统 计 分析 中 如 果 剔 除了 部 分 数据 ,是 否 在 文中 报告 ? 
原因 是 什么 ? 包含 这 部 分 数据 的 统计 结果 如 何 变化 ? 统计 分 析 中 是 如 何 处 理 缺 失 数据 的 ? 
使 用 量 表 时 是 否 删 除了 其 中 的 个 别 题目 ?原因 是 什么 ?如 果 包 含 这 部 分 题目 , 统计 结果 会 
如 何 变 化 ? 是 否 有 测量 的 项 目 或 者 变量 没有 报告 ? 原因 是 什么 ? 请 写 出 在 论文 中 的 位 置 。 
答 : 非 问卷 类 研究 ， 不 涉及 上 述 问题 


8. 研究 用 到 的 未 经 过 同行 评议 和 审查 的 实验 材料 、 量 表 或 问卷 ,是否 附 在 文件 的 末尾 以 供 
A? 如 果 没 有 , 请 写 出 理由 。 如 果 该 文 发 表 , 您 是 否 愿意 公开 这 些 材料 与 其 他 研究 者 共享 ? 
答 : 文中 提供 了 部 分 实验 材料 ， 如 果 发 表 愿 意 公开 材料 


9. 本 刊 要 求 作者 提供 原始 数据 , 请 在 以 下 3 种 里 选择 一 种 打 : 

a) 投 稿 后 将 数 E 发 至 编 辑 部 邮 箱 
(v) 

d) 数 据 可 以 A 如 下 # È 中 R 得 


( ) 


中) 原始 数据 和 程序 已 在 心性 


( ) 


中 如 不 能 提供 ， 请 说 明 到 


昌 或 提供 有 关 证 明 。 


10. 您 的 研究 是 否 是 临床 
如 果 是 ,请 提供 预 注 册 登 
如 果 没 有 , 请 说 明 原 因 


F 预 或 实验 室 实验 ? 


站 “下 载 中 心 ”) 或 https://osf.io/ 或 https://aspredicted.org/。 如 果 您 


注 : 临床 干预 或 实验 室 实验 , 建 


À 他 实验 研究 预 注册 。 预 注 
假设 及 其 支持 ,以 及 实验 让 


刊 的 预 注册 网 站 是 https://os.psych.ac.cn/preregister (使 


EAA (pre-register). th exp Ft 
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科学 数据 银行 (https:/psych.scidb.cn/) 上 分 享 


所 有 的 研究 
BATM 


HSE PEN 


参考 https://osf.io/Sawp4/ 


11. 您 的 研究 如 果 用 到 了 人 类 或 动物 被 试 , 是 否 得 到 所 在 单位 伦理 委 


iit 


把 扫描 版 发 至 编 


E: 否 ， 工 作 单 位 没有 成 立 伦 到 


12. 是 否 依据 编辑 部 网 站 发 布 的 “英文 摘要 写作 注意 事项 ”撰写 400~500 个 单词 的 英文 大 摘 


要 ? 英文 题目 和 摘要 是 否 已 请 英语 好 的 专业 人 士 把 关 或 者 已 送 专业 SCI/SSCI 论文 编辑 公司 


修改 润色 ? 
B: 是 全 英文 稿件 


13. 如 果 第 一 作者 是 学 生 , 请 导师 单独 给 纺 


文 并 认真 把 关 。 是 否 已 提醒 导师 给 编 


B: 非 学 生 


14. 请 到 编辑 部 网 站 首 
的 保密 办 公章 ， 把 扫描 但 
通讯 作者 的 单位 公章 。 是 否 已 发 邮 伯 


心 ” 下 载 并 填写 “稿件 不 涉 密 证 明 ” 加 盖 通 讯 作者 
邮箱 (xuebao@psych.ac.cn)。 如 没有 保密 办 


‘ 
答 : 是 


辑 部 (xuebao@psych.ac.cn) 发 邮件 , 说 明 已 阅读 本 
F? (编辑 部 收 到 导师 邮件 后 才 会 考虑 进入 稿件 处 理 流程) 
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Form analysis system: An EEG study of object, word, and Greeble recognition 

Abstract: 
Objectives: The form analysis system efficiently conceptualizes how object 
recognition is encoded in a frame-and-fill model. However, little is known about the 
neural basis of the form system. The present study aimed to narrow this gap using 
EEG. Methods: Participants were instructed to passively view six types of images: 
geometric shapes, animal headless bodies, plants, Chinese words, English words, and 
Greebles. Result: Shared negativity waves in the occipital lobe from 100 ms to 200 
ms were observed across the three object domains, including geometric figures, 
animal bodies, and plants, but not observed in Chinese characters, English words, or 
Greebles. 
Conclusion: The form analysis system was engaged with geometries, bodies, and 
plants, but not with words or faces. These results suggest that stimuli holding the 
medial axis structure can induce similar negativity waves in the human brain. 
Our study sheds new light into the human visual system, revealing a form analysis 
system existed. Understanding the neural patterns of the form analysis system 
enhances our comprehension of visual object recognition. It could inform 
advancements not only in human visual cognition research but also in machine visual 
fields. 
Keywords: Form analysis system; Objects; Words; Faces; EEG 

“Seeing is a constructive process in which the brain responds in parallel to many 


different features of the visual scene and attempts to combine them into meaningful 
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wholes, using its past experience as a guide.” 
--Francis Crick 
There is clearly a continuum between visual and semantic processing: dissimilar 
visual shapes may be identified as the same category, while different linguistic labels 
may represent similar visual appearances. The inferior temporal (IT) cortex, 
traditionally associated with visual object recognition, is also known to encode 
semantic dimensions (Khaligh-Razavi, Kriegeskorte., 2014). This highlights the 
challenge of distinguishing between visual and semantic categorical representations. 
However, converging evidence shows that visual encoding begins in early visual 
cortices (e.g. V1, V2, V3), while semantical information is processed in higher-level 
visual areas (e.g. IT). This suggests that visual encoding is a more fundamental 
process preceding the semantic process. Numerous theories have aimed to identify the 
fundamental visual elements of object recognition, generally falling into three main 
classes: global Gestalt perception (Rennig et al., 2015), shared converging features 
(Coutanche & Thompson-Schill., 2014), and statistical Bayesian inference (Erdogan 
& Jacobs., 2017). Recently, a new account has been proposed: the form analysis 
system (Spelke, 2022). Although many studies have demonstrated the importance of 
forms in object categorization, few hold the belief that the form analysis system is 
innate until a proposal suggested infants and adults are predisposed to recognize and 
categorize objects based on shape structure delineations that capture the form 
characteristic of plants and animals (Spelke., 2022). 


Ever since Blum introduced the medial axis transform (MAT) in 1973, form 
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skeletal representation of visual shape has played a prominent role in theories of 
visual shape. Recently, a skeleton-based sign language recognition and generation 
framework has been proposed to support bidirectional communication between deaf 
and hearing people (Xiao, Qin, & Yin., 2020). The communicative power of skeleton 
information is such that these skeleton-based signs can be effortlessly identified 
without any language. Beyond efficient processing, skeleton-varied objects can also 
be encoded immediately. Using high-density electroencephalography (EEG), 
researchers found that the classification of animate and inanimate categories occurred 
around 60 milliseconds after stimuli onset, and some individual exemplar 
comparisons could be identified even earlier, around 40 milliseconds (Gurarty, 
Mruczek, Snow, & Caplovitz., 2022). Why are we so sensitive to skeleton-based 
information? Is there a special system analyzing these skeleton properties across 
various categories? By encapsulating evidence from different domains, including 
objects, diverse writing systems, and social face stimuli, the present study, using EEG, 
aimed to investigate whether or not the form analysis system exists. 
Form Analysis in objects 

Objects in the physical world can be abstractly represented as geometry. The 
human mind is equipped with a form analysis system to analyze surrounding visual 
objects in terms of geometric shapes (Wilder et al., 2019). When adults were 
instructed to tap on a figure anywhere they wanted, they tended to tap on the medial 
axis, the form of these geometries (Psotka,1978; Firestone, Scholl, 2014; Ayzenberg et 


al., 2019). This effect persists even when small perturbations in the shape contours 
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(Feldman, Singh, 2006; Ayzenberg et al., 2019). However, when asked to predict the 
tapping results, few adults developed an awareness of the geometric forms: one-third 
chose the chance model, while only 3% correctly identified the medial axis (Firestone, 
Scholl, 2014). 

Though the form analysis system may not be consciously recognized, it is a basic 
element shared across all ages and cultures. Researchers compared geometric 
knowledge among adults and children (aged 6-10 years) from urban communities (in 
the United States or France) and an isolated community in the Brazilian Amazon: the 
Mundurucu (Dehaene, Izard, Pica, & Spelke., 2006; Izard, Pica, & Spelke, 2021). 
When asked to locate geometric deviants in panels of six forms with variable 
orientations, adults and children from both urban and rural communities showed 
strong similarities in their geometric intuitions: difficulty problems shared among all 
ages and in both cultures, suggesting that form analysis is a universal aspect of the 
human mind. 

The form analysis system not only exists in literacy and illiteracy populations but 
is also shared among newborn humans and non-human animals. Newborns’ 
preferential looking between pairs of stimuli varying in real size and viewing distance 
was only determined by retinal size, suggesting that an invariant form representation 
can be abstracted from the size-changing visual appearance on the retina (Slater, 
mattock, & Brown., 1990). In a follow-up experiment, newborns were desensitized to 
changes in distance and retinal size during familiarization trials. Subsequently, they 


strongly preferred a different-sized object over the familiar one, indicating that the 
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form-invariant representation not only abstracted size constancy on the retina but also 
captured the real objects' appearance across varied retinal sizes. These results suggest 
that newborn visual systems can begin building form-invariant object representations 
from the onset of visual object experience. Evidence from newborn animals revealed 
similar results. Using the imprinting response, Wood (2013) tested newborn chickens! 
object recognition abilities without training. During the input phase, the imprinted 
object was displayed from a single 60° viewpoint range, balanced on the left and right 
display walls. The test phase examined whether newborn chickens could recognize 
virtual objects across changes in viewpoint. Distinguishing between these objects 
from novel viewpoints requires an invariant form representation that can generalize 
across large, novel, and complex changes in the object’s appearance. The results 
showed that, from the onset of visual experience, newborn chickens generate 
form-invariant object representations from changing viewpoints. 

Furthermore, the form analysis system is not limited to the geometries domain 
but is also presented in the bodies and plants realms. Gunnar Johansson (1950) 
combined light bulbs on an experimenter's joints and presented adult participants with 
these biological moving light-dots displayed in the dark. When the experimenter was 
walking, participants immediately perceived these spatial separate point-lights as a 
human in motion. This finding revealed that participants processed these insolated 
light dots as a solid body consisting of limbs and joints, consistent with the form 
skeleton analysis system. Like bodies, humans are also sensitive to the biological 


motion of plants in point-light displays (Cutting, 1982). This sensitivity appears to be 
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well-prepared since infancy. Researchers designed experiments with plants that had 
either a natural or unnatural growing structure, with or without leaves (Sarmiento & 
Spelke, in process). They found that children took longer to touch the natural growing 
structures and those with leaves, indicating an innate sensitivity to biologically 
relevant forms. 

Form Analysis in different writing systems 

Beyond the ancient objects shared with animals, humans have developed special 
symbols: words. Reading is one of the most well-practiced abilities for people in 
modern societies. With extensive practice, the huamn mnid raed wrods as a wlohe 
(Grainger & Whitney., 2004). Despite the letter position in the last sentence being 
transposed or jumbled, it is still easy to understand the meaning. This phenomenon is 
known as the "transposed-letter effect". 

Converging evidence suggests that Indo-European languages are tolerant of 
imprecise letter positions in word identification. Researchers compared the 
electrophysiological response to four types of conditions: transposed-letter 
nonword-word pairs (e.g. “wlohe-whole”), transposed-letter word-word pairs (e.g. 
“calm-clam’’), substituted-letter mnonword-word pairs (e.g. “sitinar-similar’’), 
substituted-letter word-word pairs (e.g. “soft-salt”). They found that only the 
substitution-letter nonword-word pairs elicited a more negative waveform between 
150 and 250 ms, characterized as an N250, which is sensitive to form-level processing 
(Dun abelitia et al., 2009). These results suggest that transposed-letter nonwords more 


easily failed to be recognized as nonwords, as demonstrated in a behavioral visual 
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lexical decision task (Andrews, 1996; Chambers, 1979). Similarly, priming studies 
show that relative to a substitution prime (e.g. sedlice-SERVICE), a transposed-letter 
prime (e.g. sevrice-SERVICE) speeds up the recognition rate of a target word 
(Schoonbaert & Grainger, 2004). This transposed-letter effect extends to cases in 
which the transposition crosses a syllable boundary (e.g. snawdcih-sandwich; 
Guerrera & Forster, 2008) and to more extreme modifications (e.g. R34DING 
WORDS W1TH NUMB3R5S; Perea, Dun abeitia, & Carreiras., 2008). These findings 
suggest a high degree of perceptual similarity between word and nonword stimuli that 
comprise the same letters in different positions. Since each letter has its unique form, 
this perceptual similarity can be explained by the form analysis system. 

In most writing systems, words are written in a specific direction and orientation. 
Besides letter position, orientation is also crucial for word perception. In Chinese 
writing systems, characters are composed of the same stroke pattern but in different 
orientations can represent different meanings (e.g., “H” means excellent, “H” means 
by, “B" means accompanying, and “#8” means ministry; Zhang, Ni, & Li., 2020). 
For young Chinese readers, these characters can be visually confusing. In the English 
writing systems, some letters share geometric forms that are mirror images of each 
other, such as “b” and “d”, “p” and “q” (Freud, Behrmann, & Snow., 2020). 
Furthermore, the inversion effect is another aspect of orientation (Diamond & Carey, 
1986; Gauthier, Williams, Tarr, & Tanaka, 1998; Valentine, 1988; Yin, 1969). 
Researchers investigated the inversion effect in Chinese character processing (Kao, 


Chen, & Chen., 2010). They found that the proportion of correct responses for 
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matching real characters significantly reduced when the characters were turned upside 
down (96% vs 93%). However, no significant difference was observed in 
performance for upright and inverted non-characters (95% vs 94%). Using a novel 
visual perspective-taking task, Surtees and collaborators (2012) instructed child and 
adult participants to make judgments about the appearance of numerals, such as 6 and 
9, which appear different when inverted. When a numeral is presented on the wall, it 
appears the same to both the self and avatar; while presented on the table, the stimulus 
is viewed inverted. They found that all participants had more difficulty recognizing a 
character when the stimuli were inverted than when presented upright. 

By acquiring sensitivity to characters’ orientations, the form analysis system can 
help children recognize mirrored and inverted characters more effectively. 
Form Analysis in face 

Face recognition and visual word recognition are both examples of expert 
perceptual skills acquired through years of practice. Holistic processing, in which 
individuals have difficulty ignoring irrelevant face information but focus more on the 
selected part of a face, is a hallmark of form analysis in face recognition (Tanaka and 
Farah, 1993; Young, Hellawell, & Hay, 1987). Both the face inversion effect and the 
part-whole effect underlies the holistic mechanism. The inversion effect (Thompson, 
1980) illustrates that when a face’s eyes and mouth are inverted while the rest of the 
face remains upright, the face appears distorted only when viewed upright but not 
when inverted. This vulnerability to orientation suggests that these facial features 


constitute the core form structure of facial identification. On the other hand, the 
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part-whole effect (Tanaka & Simonyi, 2016) demonstrates that individuals are better 
at recognizing isolated facial features, such as the eyes, mouth, or nose, when 
presented within the original face rather than in isolation or within a different face. 
The whole advantage suggests that face parts are not processed as standalone features 
but rather as integral forms of the whole face. 

Tarr and his colleagues designed a new type of artificial object called “Greeble’”, 
which shares a facial configuration with a geometric body (Gauthier & Tarr., 1997). 
This was intended as ideal control stimuli for face studies. The logic behind studies 
using Greebles is to investigate whether non-face objects can produce similar 
behavioral or neural effects as face stimuli. Researchers found that Greebles were 
processed more holistically by experts than by novices (Gauthier & Tarr, 1997, 2002), 
suggesting that Greebles could be perceived as face-like stimuli at a behavioral level. 
Greeble processing also shared a neural basis with face stimuli. The Fusiform Face 
Area (FFA), a brain region specialized for faces, was activated when subjects 
passively viewed Greebles (Gauthier & Tarr, 2002; Gauthier et al., 1999). 
Additionally, the face-specified ERP component N170 was observed when 
participants looked at the Greebles stimuli (Rossion et al., 2000). Furthermore, 
pathological evidence suggested that Greebles are processed similarly to faces and 
share similar mechanisms (Gauthier et al. 1999; but see Gauthier, Behrmann, & Tarr., 
2004). 

We used Greebles as face-like stimuli in this study for two reasons. First, studies 


involving Greeble provide converging evidence that face-specific effects can be 
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obtained with visually similar non-face objects. Second, the facial features of 
Greebles can be removed, allowing us to compare social forms with geometric 
features more effectively. 
Form Analysis in Brain 

The human ventral temporal cortex (VTC) is a key structure in high-level visual 
processing, including object recognition (Grill-Spector, Kourtzi, & Kanwisher, 2001), 
reading (Cohen, et al., 2000; Wandell, Rauschecker, & Yeatman, 2012) and face 
perception (Kanwisher, McDermott, & Chun, 1997). Some research suggests that the 
form analysis system exists in the infratemporal cortex. Bao and his colleagues (2020) 
divided the inferotemporal (IT) cortex, the core brain area responsible for object 
recognition, into four networks. These not only included the established face and body 
systems but also introduced two new networks: the NML network and the stubby 
network. However, these four networks cover only about 53% of the IT cortex, 
leaving many areas unexplored. Another crucial area for object recognition is the 
lateral occipital cortex (LO). Ayzenberg and his colleagues (2022) used fMRI 
revealing that the skeletal model explained significant unique variance in the response 
of the LO. However, this research used man-made artificial stimuli. To our 
knowledge, no study has explored the neural basis of the form analysis system using 
natural stimuli, such as animal skeletons and bare trees, or in other domains where the 
form analysis system exists. 

To sum up, this study aimed to identify the brain areas activated when 


participants passively viewed various stimuli, including objects (such as geometric 
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figures, animal bodies, and plants), Chinese words, English words, Greebles, and the 
form counterparts of all these stimuli as the experimental group. 
Materials and Methods 
Participants 

Twenty health subjects were recruited from the University Community. Nine 
participants were excluded during data analysis: one participated in the evening, three 
misunderstood the task, and five had more than 25% invalid trials. The remaining 
eleven undergraduate and graduate students (7 female, mean age = 22.2+2.61 years, 
range = 18.7-25.7) were included in the data analysis. All participants were 
right-handed, had normal or corrected-to-normal vision, and had no history of 
neurological/psychiatric disorders, or reading/learning difficulties. This study was 
approved by the Ethics Committee of our University. 
Materials and Procedure 

The stimuli (samples shown in Figure 1) consisted of black-scale images from 
six different object categories (12 exemplars per category): geometric figures, animal 
bodies without heads (in a neutral, standing position), plants, Chinese words (from the 
List of core vocabulary in Mandarin Chinese, published by Cambridge Assessment 
International Education, 2023), English words (from the British Lexical Project 
database; Keuleers, Lacey, Rastle, & Brysbaert., 2011), and Greebles (materials 
reproduced from Professor Tarr’s lab; Bukach et al., 2012). Each of the six object 


categories was divided into two levels: shape and form. We were particularly 
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interested in the differences between these levels. To clarify this distinction, we 
defined the more familiar stimuli with outline contours as the shape level (see the 
bottom row in Fig. 1), and the designed stimuli with inner skeletons as the form level 
(see the top row in Fig. 1). 

Figure 1. Samples of stimuli. 

For the first three categories—geometric figures, animal bodies without heads, 
and plants—the concept of the form level was clear. Geometric figures contained the 
medial skeletal axis, animal bodies included their skeletons, and plants were 
represented as skeletal bare trees without leaves. All stimuli in these three categories 
at the form level incorporated skeleton information, making the difference between 
shape and form levels meaningful. In contrast, the last three categories (Chinese 
words, English words, and Greebles) were man-made and were interpreted based on 
their typical usage. To make the difference scores of natural and man-made stimuli 
comparable, the meaningful man-made characters were assigned to the shape level 
condition, while their designed inverted-mirrored counterparts were assigned to the 
form level. 

In the Chinese writing system, some characters are mirror images of each other 
but have different meanings (e.g., “FR” and “由 ”,“ 陪 ”and 5”): To eliminate the 
impact of these mirror words on character perception and to preserve the overall word 
form as much as possible, we first inverted the original words in both writing systems. 
We then mirrored the inverted text to create the form word stimuli. This 


transformation process is illustrated in Figure 2. Using novel inverted-mirrored words 
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as control stimuli has a significant advantage: both inverted and mirrored words are 
overlearned stimuli. Thus, participants’ prior experience with typical instances of 
inverted or mirrored words could influence their perception of the experimentally 
created inverted-mirrored stimuli, especially if participants had no training on the 
modified versions. Moreover, the word length effect can be a confounding variable. 
The processing of a visual word is correlated with its length, usually expressed as the 
number of letters (Barton et al., 2014; New, Ferrand, Pallier, & Brysbaert, 2006). To 
control for this, all Chinese words used in this study were two characters long, and all 


English words were four letters long. 


(xs transform Mirrored transform 


Figure 2. Schematic of the form words transformation process. In the second 


black screen, the word is inverted. Then the inverted text is mirrored, preserving the 
global word shape. 

For the form condition of the man-made face stimuli—‘Greeble"—we removed 
the face features using Photoshop software while keeping all other information the 
same as in the shape counterparts. 

All 12 kinds of stimuli (six object categories with two levels) were shuffled. 


Participants were instructed to watch the screen and remain still. A total of 288 trials 
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were conducted, divided into two blocks with a 30-second rest. For each trial, the 
stimulus was presented for 500ms, followed by a fixation cross that appeared for a 
random duration between 900ms and 1100ms. 
EEG recording 

EEG was recorded continuously using a Neuroscan Grael amplifier (512Hz 
sampling rate: Cz reference) from 29 Ag/AgCl scalp electrodes mounted in an elastic 
cap and positioned according to the extended 10-20 system. The montage included 5 
midline electrode sites (Fz, Cz, CPz, Pz, Oz) and 12 sites over each hemisphere 
(Fp1/Fp2, F3/F4, F7/F8, FC3/FC4, FT7/FT8, T7/T8, C3/C4, TP7/TP8, CP3/CP4, 
P3/P4, P7/P8, and O1/O2). Additional electrodes were used as ground, reference sites, 
and electrooculogram (EOG). Electrodes on the mastoids (M1/M2) were recorded but 
not used as a re-reference in this study. 
EEG preprocessing 

All EEG data preprocessing was performed using MNE-Python software 
(Gramfort et al., 2013) v.1.5.1 in Python v. 3.12.0. The raw, continuous EEG data 
were bandpass-filtered in two ways: once using a highpass cut-off of 1Hz for artifact 
identification with independent components analysis (ICA) and once with a high-pass 
cut-off of 0.2Hz for further analysis. In both cases, the low-pass cut-off was 30Hz. 
ICA was applied to the continuous 1-30Hz bandpass-filtered data using the fastica 
algorithm (Hyvärinen, 1999), with the number of components set to explain 99% of 
the variance in the data. The ICA decomposition was then applied to the 0.2-30Hz 


bandpass-filtered data, and the EOG components were removed before further 
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analysis. The ICA-corrected data were segmented from 200ms prior to the onset of 
the stimuli to 800ms after. All segmented data were baseline-corrected and 
re-referenced to the average of all electrodes. 
Results 

The ERP grand waveforms elicited by the 12 object categories appeared 


generally similar, exhibiting a classic P1-N1-P2 pattern at lateral posterior electrode 
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sites (Figure 3). 

Figure 3. Grand average waveforms. The ERPs to shape (full lines) and form 
(dotted lines) for each condition at the nine ROIs. ERPs to stimuli in shape and form 
conditions are color-coded as follows: geometric figures in blue, animal bodies in 


dodgerblue, plants in green, Chinese characters in limegreen, English words in 


10 


11 


18 / 37 


orangered, and face in orange. Positive plotted upwards. 

Since we were interested in the ERP differences between the shape and form 
condition level and their interaction with six types of object categories (the Target), 
we examined the difference waves created by subtracting the shape condition from the 


form condition for each target category. The difference waves are shown in Figure 4, 
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and the corresponding scalp topographic maps are presented in Figure 5. 

Figure 4. Difference waveforms. The ERPs for the shape condition subtracted from 
those for the form condition are shown for different categories: geometric figures in 
blue, animal bodies in dodger blue, plants in green, Chinese characters in lime green, 


English words in orange-red, and Greebles in orange at the nine ROIs. Positive values 


are plotted upwards. 
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Figure 5. Topographic plots derived from different waveforms. The ERPs for the 
shape condition subtracted from those for the form condition are shown for each 
object category in the analyzed time windows (-100 to 0 ms,0 to100 ms,100 to 200 ms, 
200 to 300 ms, 300 to 400 ms, 400 to 500 ms, 500 to 600 ms, 600 to 700 ms, 700 to 
800 ms). Negativities are depicted in blue and positivities in amber. 

The topographic map (Figure 5) showed that the first three conditions 
(geometric figures, animal bodies, and plants) shared similar difference waves during 
the period of 100-200ms after the onset of stimuli. Specifically, more negativity was 
observed over the vertex and extending to posterior sites from approximately 


100-200ms, while more positivity appeared in the anterior sites. However, the last 
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three conditions (Chinese characters, English words, and Greebles) did not exhibit a 
common pattern. Figure 4 suggested that the shared pattern was more obvious at the 
lateral posterior sites, especially for the right hemisphere. Indeed, a negative wave 
was shared among geometric figures (blue), animal bodies (dodger blue), and plants 
(green). 

To pinpoint the specific channel of the shared ERP, we separately compared the 


parent waves with the different waves at the lateral posterior sites: P3, P4, P7, P8, O1, 
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and O2 (shown in Figure 6). The results indicated that the specific ERP was most 
profound at the O1 and O2 electrodes. 

Figure 6. Parent waveforms and difference waveforms. The ERPs for geometric 
figures (blue), animal bodies (dodger blue), plants (green), Chinese characters (lime 
green), English words (orange red), and Greebles (orange) at the six channels: P3/P4, 
P7/P8, and O1/O2. Positive values are plotted upwards. 

We computed mean amplitudes over a set of a priori time windows covering the 
post-stimuli period from 100-800 ms: 100-300 ms, 300-500 ms, 500-700 ms, 700-800 
ms. The data were imported into the R software package v 4.3.0 and analyzed using 
the linear mixed effects modeling with the bam function in the mgcv package v 1.8-42 
(Wood, 2017; Tremblay & Newman, 2015). Three candidate models were fit for each 
time window and compared using the Akaike information criterion (AIC; Akaike, 
1973), which considers both the fitness and complexity of models. The best model for 
each time window included fixed effects of the target (geometric figures/animal 
bodies/plants/Chinese characters/English words/Greebles), condition levels 
(shape/form), and baseline, along with all possible interactions between these 
variables, as well as random intercepts for subjects (shown in Table 1). In all analyses, 
we excluded the interaction effect involving the baseline variable, meaning the 
baseline was controlled for in the results. 

Table 1. Results of the LME model for Level, Target, and baseline at the O1/02 


electrodes. 


df F p-value 


100-300ms 
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Level 1 4.48 0.0340 
Target 5 63.86 0.0000 
baseline 1 22.97 0.0000 
Level:Target 5 8.92 0.0000 
Level:baseline 1 4.33 0.0380 
Target:baseline 5 0.39 0.8550 
Level:Target:baseline 5 2.80 0.0160 
50m 
Level 1 0.39 0.5347 
Target 5 24.18 0.0000 
baseline 1 7.78 0.0053 
Level:Target 5 9.72 0.0000 
Level:baseline 1 0.43 0.5130 
Target:baseline 5 0.77 0.5682 
Level:Target:baseline 5 1.36 0.2355 
S000m 
Level 1 6.23 0.0126 
Target > 11.35 0.0000 
baseline 1 10.65 0.0011 
Level:Target 5 3.82 0.0019 
Level:baseline 1 0.20 0.6511 
Target:baseline 5 1.21 0.2998 
Level:Target:baseline 5 1.58 0.1624 
Wm 
Level 1 5.10 0.0240 
Target 5 2.02 0.0730 
baseline 1 16.07 0.0001 
Level:Target 5 1.48 0.1940 
Level:baseline 1 0.29 0.5890 
Target:baseline 5 1.98 0.0790 
Level:Target:baseline 5 2.28 0.0440 


Note: Level = shape and form; Target = geometric figures, animal bodies, plants, Chinese characters, English 


words, and Greebles. 


Significant interactions involving Target and condition levels were further 
analyzed by comparing the form-shape contrasts for each target at the averaged ROI: 
O1 and O2 (shown in Table 2.). The p-values of these contrasts were corrected using 
the Benjamini and Hochberg (1995) method. 


Table 2 Form-Shape contrasts. 
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Target t p (raw) P(FDR BH) Effect.Size SE. eff lower.CL upper.CL 
W300m 
geo -2.1480 0.0318 0.0384 -0.1373 0.0639 -0.2627 -0.0120 
bodies -3.5154 0.0004 0.0013 -0.2236 0.0636 -0.3483 -0.0989 
plants -7.0476 0.0000 0.0000 -0.4499 0.0638 -0.5750 -0.3247 
Chinese -2.7285 0.0064 0.0128 -0.1736 0.0636 -0.2984 -0.0489 
English 2.1447 0.0320 0.0384 0.1355 0.0632 0.0116 0.2593 
Greebles -1.5901 0.1119 0.1119 -0.1022 0.0643 -0.2282 0.0238 
| 
geo -0.6315 0.5277 0.6333 -0.0405 0.0641 -0.1663 0.0852 
bodies -0.8672 0.3859 0.5788 -0.0553 0.0638 -0.1804 0.0697 
plants -5.8397 0.0000 0.0000 -0.3755 0.0643 -0.5015 -0.2494 
Chinese 0.3930 0.6944 0.6944 0.0251 0.0640 -0.1003 0.1506 
English 3.5685 0.0004 0.0011 0.2274 0.0637 0.1025 0.3523 
Greebles -2.3778 0.0174 0.0349 -0.1542 0.0649 -0.2814 -0.0271 
he SS SS aa ae SS 
geo -2.5046 0.0123 0.0246 -0.1617 0.0646 -0.2883 -0.0351 
bodies -0.7939 0.4273 0.6410 -0.0509 0.0642 -0.1767 0.0748 
plants -4.1911 0.0000 0.0002 -0.2689 0.0642 -0.3948 -0.1431 
Chinese 0.5011 0.6163 0.6862 0.0324 0.0646 -0.0943 0.1591 
English -0.4040 0.6862 0.6862 -0.0260 0.0644 -0.1522 0.1002 
Greebles -3.9787 0.0001 0.0002 -0.2557 0.0643 -0.3817 -0.1297 
700-800 
geo -2.2448 0.0248 0.1191 -0.1462 0.0651 -0.2739 -0.0185 
bodies 0.8901 0.3734 0.7193 0.0576 0.0647 -0.0693 0.1845 
plants -0.3735 0.7088 0.7193 -0.0242 0.0648 -0.1513 0.1029 
Chinese -0.3595 0.7193 0.7193 -0.0234 0.0651 -0.1510 0.1042 
English -2.0575 0.0397 0.1191 -0.1331 0.0647 -0.2598 -0.0063 
Greebles -0.3802 0.7038 0.7193 -0.0247 0.0649 -0.1520 0.1026 

1  100-300ms 
2 Negativity was prominent in the difference waveforms and topographic maps, 


3 especially in the earliest time window: 100-200ms. Both the main effect of Level (F(1) 
4  =4.48, p = 0.0340) and Target (F(5) = 63.86, p < 0.0001) were significant, and the 
5 interaction effect between Level and Target was also significant (F(5) = 8.92, p < 
6 0.0001). The Form-Shape contrasts for each Target condition are shown in Table 2. 
7 The mean amplitude for geometric figures, animal bodies, plants, and Chinese 


8 characters showed significantly greater negativity in the form condition compared to 
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the shape condition at the averaged O1 and O2 electrodes, while the English words 
condition showed significant positivity in the form condition than the shape condition. 
300-500ms 

In this time window, the difference waveforms between form and shape were not 
significant (F(1) = 0.39, p = 0.5347). However, the main effect of Target (F(5) = 
24.18, p < 0.0001) and the interaction effect between Level and Target were still 
significant (F(5) = 9.72, p < 0.0001). The Level x Target interaction was attributable to 
larger negativity for plants but larger positivity for English words. 
500-700ms 

In this time window, both the main effect of Level (F(1) = 6.23, p = 0.0126) and 
Target (F(5) = 11.35, p < 0.0001) were significant, and the interaction effect between 
Level and Target was also evident (F(5) = 3.82, p = 0.0019). Further contrast analysis 
showed that the geometric figures, plants, and faces categories exhibited greater 
negativity at the Ol and O2 channels in the form condition compared to the shape 
condition. 
700-800ms 

In the final time window, although the main effect of Level was still significant 
(FU) = 5.1, p = 0.0240), which may have contributed to greater negativity in 
geometric figures and English words conditions, no significant difference between 
form and shape were found for each Target conditions under the corrected p values. 
Discussion 


To answer whether there is a specific brain signal corresponding to the form 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


21 


22 


25 / 37 


analysis system, the present study designed form and shape conditions for various 
stimuli, including objects (geometric figures, still-standing animals without heads, and 
plants), words (mirror-inverted Chinese words and English words), and artificial face 
stimuli (the Greebles). We used the EEG method to probe adults' brain responses to 
these different kinds of stimuli. Our results are consistent with previous findings that 
different brain areas are involved in visual recognition when the stimuli vary among 
objects, words, and faces (Pegna et al., 2004; for reviews, see Farah, 1994). 

We found a limited existence of form systems across object categories, including 
2D geometric figures, animal bodies, and plants, but not in visual writing words or 
Greebles conditions. When comparing the parent waveforms in the form condition to 
those in the shape condition, a similar pattern of different waves was observed for 2D 
geometric figures, animal bodies, and plants across the Ol and O2 channels. These 
difference waves appeared between 100-200ms after stimuli onset and negatively 
peaked at the occipital lobe. This is the first time the form analysis system has been 
demonstrated across different types of object stimuli. Although the pattern seemed 
similar at first glance, adults exhibited a relatively long negative different wave for 
plant conditions. Compared with the other two object stimuli, the form condition in 
plant stimuli resulted in a longer period of negativity than the shape condition. The 
significant negativity waves for the former two conditions lasted from 100 ms to 300 
ms, while the plant stimuli lasted from 100 ms to 700 ms. To our knowledge, no study 
has investigated human brain wave responses to plants. We were surprised by the long 


negativity response to plant stimuli but could not determine the underlying reason. Do 
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plants induce a more peaceful sense in humans, or do they hold a more ancient 
meaning compared with other kinds of object stimuli? Those questions remain 
unexplored. 

The different waves of form-shape in the words condition lagged by 100 ms 
compared to the objects condition and exhibited a reverse pattern between the first 
and second language conditions. For Chinese participants, the Chinese words induced 
a negativity wave, while the English words elicited a positivity wave. During 
interviews, one participant reported that although the stimuli were transformed, the 
first-language words could still be recognized. In contrast, the second-language words 
could not be understood before disappearing. Thus, the significant positivity from 300 
ms to 500 ms induced by the second language may reflect participants’ surprise at not 
being able to figure out the meaning of the words (Lau, Phillips, Poeppel, 2008). It 
remains to be seen whether a similar effect exists in English-speaking participants. 

The onset of the form-shape negativity wave of artificial face stimuli (Greebles) 
was not observed until 300ms. Most ERP studies described an earlier response to 
faces at around 170 ms, known as the N170, characterized by a vertex-positive and 
bilateral temporal-negative deflection (Bentin et al., 1996; Eimer, 2000; Pegna et al., 
2002). The discrepant results may be due to the design of the form stimuli of Greeble. 
For the form condition of the man-made face stimuli, we removed the face features 
using Photoshop software while keeping all other information the same as in the 
shape counterparts. However, this operation removed the core form structure of face 


stimuli, such as eyes and mouth. Further studies should retain the facial features and 
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remove the geometric bodies of the Greeble stimuli. 

The limited existence of the form analysis system in objects, but not in words 
and faces, may be due to the asymmetric characteristics of these stimuli. Indeed, the 
skeletons of the mentioned three types of object stimuli can be extracted by the form 
analysis system. Whether 2D geometric figures, animal bodies, or plants, all these 
stimuli’s skeletons fit perfectly with the medial axis and grassfire illustration proposed 
by Blum. However, Blum’s medial axis and grass fire model do not fit as well with 
word stimuli or face stimuli. Although we assume that words and faces also have their 
form or skeleton, their skeleton does not conform to the medial axis. This may be the 
core reason that the form analysis system is limited to objects and not applicable to 
words and faces. 

However, the fact that our medial axis explanation, which fully explains our 
LO-oriented negativity waves limited to the object domain but not to the writing 
system or face stimuli, should not be over-interpreted. More experiments should be 
conducted before generalizing these results to a wider domain. According to Bao et al. 
(2020), at least four kinds of categories in two dimensions (animacy and spikiness) 
should be tested. Could this negativity wave also be observed in other domains that fit 
well with the medial axis model, such as man-made categories or silhouettes? 

It should be noted that due to the low EEG spatial resolution, there was an 
unavoidable imprecision in localization. Thus, further studies using the {MRI method 
should pinpoint the precise brain areas activated by the form analysis system. Indeed, 


the present study suggests that the form analysis system exists in the lateral occipital 
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cortex; however, the further processing pathways to the temporal lobe or parietal lobe 
are still unclear. Traditional researchers believed that the ventral occipitotemporal 
pathway processes the properties of object perception, such as shape, texture, and 
color, whereas the dorsal occipitoparietal pathway encodes the spatial and temporal 
information of objects, such as motion and partial relations. Converging evidence, 
however, showed that both the ventral pathway and the dorsal pathway contributed to 
object recognition (Ayzenberg, Simmons, & Behrmann., 2023; for review, see Freud, 
Behrmann, & Snow., 2020). Recent studies have revealed that the dorsal pathway also 
represents the global shape (Ayzenberg and Behrmann., 2022). Moreover, considering 
both infants (Bertenthal, Proffitt, & Kramer, 1987) and adults (Johansson., 1950) are 
sensitive to the biological motion presented in point-light displays, and patients with 
dorsal pathway lesions but intact ventral pathways are selectively impaired in 
biological motion perception (Vaina et al., 1990), we suggest that the dorsal pathway 
also plays a pivotal role in form skeleton perception. Thus, further studies should 
investigate the corresponding roles of the ventral and dorsal pathways in the form 
analysis system. 

This present study aimed to discover the neuro basis of the form analysis system 
of human vision, rather than exploring the developmental process that builds this 
system. How might biological development implement this process? Despite the 
tremendous variation in visual images, the form analysis system can represent the 
same object in different retinal positions, poses, distances, and sizes. Evidence from 


animals (Wood., 2013) and human infants (slater et al., 1990) shows that powerful, 


21 


22 


29 / 37 


robust, and invariant skeletal-based object recognition machinery is an inherent 
feature of the newborn brain. Thus, further steps could explore the neural basis of the 
form analysis system in newborns using EEG, to determine whether the form analysis 
system is core knowledge that happens before the onset of visual experience. 

Finally, machine vision, analogy to human vision, remains one of the most 
challenging problems in artificial intelligence. Several studies compared AI-based 
deep convolutional neural networks to human vision and converging evidence 
suggested that convolutional network models do not classify based on global object 
shape as humans do (Baker, Lu, Erlikhman, & Kellman., 2018; 2020; Lowet, 
Firestone, & Scholl., 2018; Xu & Vaziri-Pashkam., 2021). For instance, when 
presented with a picture that shares the same shape as a cat but is filled with elephant 
texture, most people categorize this stimulus as a cat but all AI models classify it as 
an elephant. Our work may shed new light on machine vision. If a form analysis 
system exists in human vision, a new form analysis model could be developed and 
applied in the machine domain. Consequently, machine vision could one day see and 
categorize objects as effectively as human vision. 

Conclusions 

The present study revealed the existence of a form analysis system in the domain 
of object, including geometric figures, animal headless bodies, and plants, but not in 
different writing systems (i.e. Chinese and English) or Greebles. Shared negativity 
waves in the occipital lobe during the period of 100 ms to 200 ms were observed 


across the three object domains. These results can be perfectly explained by the 
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medial axis model, from which we can infer that the form analysis system may be 
applicable to a wider realm, as long as these categories fit well with the medial axis 
model. 
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